Dataflow: Add support for speculative taint flow. #17663

aschackmull · 2024-10-04T09:53:00Z

This adds support for speculative taint flow in the shared taint tracking library.

What is this?

This is a magic button (dial, really) that you can turn to calculate more taint flow in order to identify false negatives. So if you suspect a FN, e.g. if you're failing to find a flow for a CVE or you're facing zero results thinking that we might be missing some models, then try this!

How does it work?

Each language provides a huge candidate set of potential taint steps. The default set that I've implemented is simply any argument to any return value (plus potential side-effects on the this argument, if any) on any call for which we don't yet have an existing model or a call target within the analyzed source.
The shared library will then execute the regular taint flow, but in addition it will allow speculative flow steps drawn from this candidate set up to a specified maximum number of such edges along a given path.
It will then report flow in the usual way, and the chosen speculative edges will be visible in the path explanation with the provenance label "Speculative". (In the VSCode plugin this shows up as "(step) Speculative".)

I want to try it, show me how!

It's easy, just replace the application of the TaintTracking::Global module with TaintTracking::SpeculativeGlobal. So if you e.g. have

module MyQueryFlow = TaintTracking::Global<MyQueryConfig>;

then replace that with

int speculationLimit() { result = 10 }
module MyQueryFlow = TaintTracking::SpeculativeGlobal<MyQueryConfig, speculationLimit/0>;

The number you choose in the speculationLimit is the limit on the number of speculative steps that can be used in a path. A higher number gives more flow, but worse performance. Expect a performance degradation factor roughly equal to the chosen limit.

Testing so far, and followup work for the individual language teams

I've tested this for Java and C# with a number queries on their respective MRVA top100 with good results and reasonable performance. For the remaining languages, the candidate set of edges might need further tweaking to e.g. exclude things that happen to be calls, but which shouldn't be considered as potential taint steps. For C# I e.g. had to reduce the set to "only" include method and constructor calls, i.e. no operator nor property calls (I believe the latter is already included as read/store steps).

shared/dataflow/codeql/dataflow/TaintTracking.qll

+
+  signature int speculationLimitSig();
+
+  private module AddSpeculativeTaintSteps<


shared/dataflow/codeql/dataflow/TaintTracking.qll

+  module SpeculativeFlow<DataFlow::ConfigSig Config, speculationLimitSig/0 speculationLimit>
+    implements DataFlow::GlobalFlowSig
+  {
+    private module Config0 implements DataFlowInternal::FullStateConfigSig {


shared/dataflow/codeql/dataflow/TaintTracking.qll

+      }
+    }
+
+    private module C implements DataFlowInternal::FullStateConfigSig {


shared/dataflow/codeql/dataflow/TaintTracking.qll

+    DataFlow::StateConfigSig Config, speculationLimitSig/0 speculationLimit> implements
+    DataFlow::GlobalFlowSig
+  {
+    private module Config0 implements DataFlowInternal::FullStateConfigSig {


shared/dataflow/codeql/dataflow/TaintTracking.qll

+      }
+    }
+
+    private module C implements DataFlowInternal::FullStateConfigSig {


geoffw0

Seems like a really useful tool. I've done something similar before during Swift development, it was crude (necessarily as we didn't have provenance labels at the time), but helpful nevertheless.

I gather it's not (currently) possible to combine speculative taint flow with an existing flow state in the query?

Is it possible for any of the changes to affect performance when speculation is not being used?

aschackmull · 2024-10-15T11:22:43Z

I gather it's not (currently) possible to combine speculative taint flow with an existing flow state in the query?

No, I can happily report that combining the two very much is supported!

Is it possible for any of the changes to affect performance when speculation is not being used?

No.

aschackmull · 2024-10-16T12:47:05Z

For Ruby, I've now made some exclusions guided by the consistency check. For review please check if those are reasonable.

michaelnebel

This is very neat!

csharp/ql/lib/semmle/code/csharp/dataflow/internal/TaintTrackingPrivate.qll

hvitved

LGTM, only a renaming suggestion.

hvitved · 2024-10-29T12:48:10Z

shared/dataflow/codeql/dataflow/TaintTracking.qll

+   * Constructs a global taint tracking computation that also allows a given
+   * maximum number of speculative taint steps.
+   */
+  module SpeculativeFlow<DataFlow::ConfigSig Config, speculationLimitSig/0 speculationLimit>


Perhaps it should be named SpeculativeGlobal instead?

I'm fine with that. I'll push a rename shortly.

hvitved · 2024-10-29T12:48:25Z

shared/dataflow/codeql/dataflow/TaintTracking.qll

+   * Constructs a global taint tracking computation using flow state that also
+   * allows a given maximum number of speculative taint steps.
+   */
+  module SpeculativeFlowWithState<


Same renaming suggestion.

michaelnebel

C# LGTM!

aschackmull requested review from a team as code owners October 4, 2024 09:53

github-actions bot added C# C++ Java Python Go Ruby Swift DataFlow Library labels Oct 4, 2024

github-advanced-security bot found potential problems Oct 4, 2024

View reviewed changes

geoffw0 reviewed Oct 7, 2024

View reviewed changes

aschackmull added 12 commits October 16, 2024 14:35

Dataflow: add plumbing for adding provenance to state-steps.

c80627a

Dataflow: Add speculative flow modules.

7d12329

Java: Add support for speculative taint flow.

8b99154

Dataflow: Add consistency check.

6c6b606

C#: Add support for speculative taint flow.

7b43100

Ruby: Add tentative support for speculative taint flow.

8eb0cb4

Python: Add tentative support for speculative taint flow.

7497d95

Swift: Add tentative support for speculative taint flow.

635071f

Go: Add tentative support for speculative taint flow.

fae7175

C/C++: Add tentative support for speculative taint flow.

4e8a4a5

C/C++: Accept test changes.

9ca8a27

Add qldoc.

c20f12f

Ruby: Exclude some cases that are unlikely library calls.

42d35f8

aschackmull force-pushed the dataflow/speculative-flow branch from 605452b to 42d35f8 Compare October 16, 2024 12:35

Python: Add workaround.

4153a83

michaelnebel reviewed Oct 21, 2024

View reviewed changes

csharp/ql/lib/semmle/code/csharp/dataflow/internal/TaintTrackingPrivate.qll Show resolved Hide resolved

csharp/ql/lib/semmle/code/csharp/dataflow/internal/TaintTrackingPrivate.qll Show resolved Hide resolved

hvitved reviewed Oct 29, 2024

View reviewed changes

Dataflow: Rename SpeculativeFlow to SpeculativeGlobal.

570b042

aschackmull added the no-change-note-required This PR does not need a change note label Oct 30, 2024

michaelnebel approved these changes Oct 30, 2024

View reviewed changes

hvitved approved these changes Oct 30, 2024

View reviewed changes

aschackmull merged commit b556590 into github:main Oct 31, 2024
60 of 61 checks passed

aschackmull deleted the dataflow/speculative-flow branch October 31, 2024 07:12

MathiasVP mentioned this pull request Nov 8, 2024

PS: Add AST and CFG classes for operator & and add environment variable reads as local flow sources microsoft/codeql#136

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataflow: Add support for speculative taint flow. #17663

Dataflow: Add support for speculative taint flow. #17663

aschackmull commented Oct 4, 2024 •

edited

Loading

geoffw0 left a comment

aschackmull commented Oct 15, 2024

aschackmull commented Oct 16, 2024

michaelnebel left a comment

hvitved left a comment

hvitved Oct 29, 2024

aschackmull Oct 30, 2024

hvitved Oct 29, 2024

michaelnebel left a comment


		signature int speculationLimitSig();

		private module AddSpeculativeTaintSteps<

Dataflow: Add support for speculative taint flow. #17663

Dataflow: Add support for speculative taint flow. #17663

Conversation

aschackmull commented Oct 4, 2024 • edited Loading

What is this?

How does it work?

I want to try it, show me how!

Testing so far, and followup work for the individual language teams

geoffw0 left a comment

Choose a reason for hiding this comment

aschackmull commented Oct 15, 2024

aschackmull commented Oct 16, 2024

michaelnebel left a comment

Choose a reason for hiding this comment

hvitved left a comment

Choose a reason for hiding this comment

hvitved Oct 29, 2024

Choose a reason for hiding this comment

aschackmull Oct 30, 2024

Choose a reason for hiding this comment

hvitved Oct 29, 2024

Choose a reason for hiding this comment

michaelnebel left a comment

Choose a reason for hiding this comment

aschackmull commented Oct 4, 2024 •

edited

Loading